Eigen: a Spectral Approach to the Integration of Functional Genomics Annotations for Both Coding and Noncoding Sequence Variants

نویسندگان

  • IULIANA IONITA-LAZA
  • KENNETH MCCALLUM
  • BIN XU
  • JOSEPH BUXBAUM
چکیده

Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequence. Indeed, for any genetic variant, whether protein coding or noncoding, a diverse set of functional annotations is available from projects such as Ensembl, ENCODE and Roadmap Epigenomics. Such annotations can play a critical role in identifying putatively causal variants among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers, and their diversity. In particular, it is not clear a priori which annotation is better at predicting functionally relevant variants. It is therefore desirable to integrate these different annotations into a single measure of functional importance for a variant. Here we develop an unsupervised approach to derive such a meta-score (Eigen), that, unlike most existing methods, is not based on any labelled training data. Furthermore, the proposed method produces estimates of predictive accuracy for each functional annotation score, and subsequently uses these estimates of accuracy to derive the aggregate functional score for variants of interest as a weighted linear combination of individual annotations. We show that the resulting meta-score has better discriminatory ability using disease associated and putatively benign variants from published studies (for both Mendelian and complex diseases) compared with the recently proposed CADD score. In particular, we show that the proposed meta-score outperforms the CADD score on noncoding variants from GWAS and eQTL studies, noncoding somatic mutations in the COSMIC database, and on de novo coding mutations in epilepsy and autism studies. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpreting noncoding genetic variation in complex traits and human disease

Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. This picture has changed with advances in the systematic annotation of functional noncoding elements. Evolutionary conservation, functional genomics, chromatin state, se...

متن کامل

Adaptive Spectral Separation Two Layer Coding with Error Concealment for Cell Loss Resilience

This paper addresses the issue of cell loss and its consequent effect on video quality in a packet video system, and examines possible compensative measures. In the system's enconder, adaptive spectral separation is used to develop a two-layer coding scheme comprising a high priority layer to carry essential video data and a low priority layer with data to enhance the video image. A two-step er...

متن کامل

Conserved introns reveal novel transcripts in Drosophila melanogaster.

Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usu...

متن کامل

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations

In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotati...

متن کامل

O-31: AMH and AMHR2 Genetic Variants in Chinese Women with Primary Ovarian Insufficiency and Normal Age at Natural Menopause

Background To investigate the role of the anti-Müllerian hormone (AMH) signalling pathway in the pathophysiology of idiopathic primary ovarian insufficiency (POI) and age at natural menopause (ANM) using a genetic approach MaterialsAndMethods DNA sequencing was used to detect the genotype distribution and allele frequency of the genes AMH and AMH receptor II (AMHR2) in 120 cases of idiopathic P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015